Cythonized GroupBy Fill #19673

WillAyd · 2018-02-13T07:36:11Z

closes PERF: groupby-fillna perf, implement in cython #11296
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

I am not a fan of how I've implemented this in groupby.py but I think this change highlights even more the need for some refactoring of that module, specifically in how Cython transformations are getting dispatched. This still works in the meantime but open to any feedback on how the methods are getting wired back to the Cython layer.

Below are ASVs for the change

       before           after         ratio
     [d9551c8e]       [5e007f86]
+      81.6±0.3μs         96.7±2μs     1.19  groupby.GroupByMethods.time_method('float', 'count')
+         455±3μs         516±20μs     1.13  groupby.GroupByMethods.time_method('float', 'cummin')
+       306±0.6μs          340±5μs     1.11  groupby.GroupByMethods.time_method('float', 'prod')
-       142±0.4ms          280±6μs     0.00  groupby.GroupByMethods.time_method('int', 'bfill')
-        230±10ms          327±7μs     0.00  groupby.GroupByMethods.time_method('float', 'bfill')
-       135±0.8ms        189±0.3μs     0.00  groupby.GroupByMethods.time_method('int', 'ffill')
-         227±2ms        184±0.6μs     0.00  groupby.GroupByMethods.time_method('float', 'ffill')

chris-b1 · 2018-02-13T17:49:25Z

pandas/_libs/groupby_helper.pxi.in

+
+    sorted_labels = np.argsort(labels)
+    if method == 'bfill':
+        sorted_labels[::-1].sort()


Why are the labels reversed before sorting?

They only get reversed in the case of bfill - it's the cheapest way to fill backwards that I could think of

Right, but doesn't the .sort() undo the reversing?

I took that from the below SO article and the comment by perimosocordiae was helpful in figuring out what's going on. Open to changing it to sorted_labels = np.sort(sorted_labels)[::-1] for readability if you'd like

https://stackoverflow.com/questions/26984414/efficiently-sorting-a-numpy-array-in-descending-order

Oh, thanks! Perhaps too clever, how about this? still avoids the copy

sorted_labels.sort() sorted_labels = sorted_labels[::-1]

Actually I think you are on to something - the extra sort doesn't feel right and I think it could cause a bug as it will really just return an array ranging from N-1..0. Going to work up some test cases to validate and add fix to next commit

chris-b1 · 2018-02-13T17:55:08Z

pandas/_libs/groupby_helper.pxi.in

+                    out[idx, 0] == {{nan_val}}
+                filled_vals += 1
+            else:  # reset items when not missing
+                filled_vals = 0


Won't filled_values need to be tracked by group? Consider this case (haven't tested on your PR):

df = pd.DataFrame({ 'k': [1, 1, 2, 2, 1, 1], 'v': [1.1, np.nan, 1, np.nan, np.nan, np.nan]}) df.groupby('k').ffill(limit=2) Out[25]: k v 0 1 1.1 1 1 1.1 2 2 1.0 3 2 1.0 4 1 1.1 5 1 NaN

It needs to be tracked by both group and value. One thing to keep in mind is that the main loop in the Cython function doesn't go sequentially over the values in the series / frame, but rather iterates over the argsorteded labels, which allows you to keep track of groups easily. Just ran your example on my PR and it gave the result you have above.

That said, I don't have a problem adding a test case that mixes the groups up to prevent this from breaking in the future. Can bundle that in with the next commit

The test case for this is now a little more complicated, handling both sequential groups (ex: ['a', 'a', 'a', 'b', 'b', 'b']) and "interwoven" (ex: ['a', 'b', 'a', 'b', 'a', 'b']). Open to splitting it up into separate tests if you think that would make it more readable.

Otherwise this instance should be covered. I fixed a bug in the Cython code to go along with it - thanks for the callout!

chris-b1 · 2018-02-13T17:57:42Z

pandas/core/groupby.py

@@ -1472,7 +1479,7 @@ def pad(self, limit=None):
        Series.fillna
        DataFrame.fillna
        """
-        return self.apply(lambda x: x.ffill(limit=limit))
+        return self.apply('ffill', limit=limit)


Rather than calling apply, pattern so far has been to define the method directly Groupby.cumsum, etc

Tied to the comment below, I added _cython_apply as its own function to handle dispatching back to the Cython layer, which this apply tries first before falling back to the current method of iterating over the groups and applying the method directly from there. Items like Groupby.cumsum call _cython_transform in a similar fashion, which I was trying to emulate here but I was hesitant to send these methods down the transform route because of the complexity that would cause there

chris-b1 · 2018-02-13T17:58:36Z

pandas/core/groupby.py

@@ -2032,6 +2039,38 @@ def _get_group_keys(self):
                                          self.levels,
                                          self.labels)

+    def _cython_apply(self, ftype, data, axis, **kwargs):


I think all this should pass down to _cython_transform?

I was a little torn over this. The call signature doesn't play all that nicely with the other methods utilizing _cython_transform. Specifically, the is_numeric and is_datetimelike arguments aren't applicable and there are some conditionals that need to be added to prevent accidental casting of data during the fill. On top of that, we'd have to wrap the transformation to be sure to include the grouped columns as none of the other transformations do that.

Touched on this with @jreback in the comments of #19481 - there for sure needs to be a pretty comprehensive refactoring on groupby.py to more effectively dispatch Cython operations. For the time being I thought this approach would be the "best of the worst" option, but can take another look at wiring into transform if we feel that is a show stopper

I don't disagree the whole thing needs some refactoring, xref also to #19354 - I'll think about it a bit more too

chris-b1 · 2018-02-13T21:22:48Z

One possibility to clean this up would be to replicate GroupBy.shift

pandas/pandas/core/groupby.py

Line 1846 in 07137a5

def shift(self, periods=1, freq=None, axis=0):

Rather than type specific functions, there the cython routine computes a single indexer, then the normal take functions are used to make the actual output.

pandas/pandas/_libs/groupby_helper.pxi.in

Line 991 in 07137a5

def group_shift_indexer(int64_t[:] out, int64_t[:] labels,

Probably a bit slower than what you do here, but would be surprised if it's bad.

WillAyd · 2018-02-13T21:29:53Z

Thanks for the idea. I think I could make something like that work here and that would definitely simplify things

WillAyd · 2018-02-13T23:23:37Z

Just simplified the code to mirror what we do for shift. The groupby.py changes are MUCH cleaner now and there doesn't appear to be any significant change to benchmarks either. Updated results below:

       before           after         ratio
     [07137a5a]       [0939aeae]
+           983ms            1.23s     1.25  groupby.GroupByMethods.time_method('float', 'pct_change')
+         356±4μs         443±20μs     1.24  groupby.GroupByMethods.time_method('float', 'prod')
+      84.6±0.2μs          101±3μs     1.19  groupby.GroupByMethods.time_method('int', 'count')
+         185±1μs          218±3μs     1.18  groupby.GroupByMethods.time_method('int', 'cumcount')
+         161±3μs          185±3μs     1.15  groupby.GroupByMethods.time_method('float', 'cumcount')
+         334±9μs         382±20μs     1.14  groupby.GroupByMethods.time_method('float', 'median')
+         461±3μs         523±10μs     1.13  groupby.GroupByMethods.time_method('int', 'cummax')
+       344±0.8μs         387±10μs     1.12  groupby.GroupByMethods.time_method('float', 'nunique')
+      64.5±0.2μs       72.1±0.9μs     1.12  groupby.GroupByMethods.time_method('int', 'size')
+         456±2μs         504±20μs     1.11  groupby.GroupByMethods.time_method('float', 'cummin')
+         133±5ms          146±3ms     1.10  groupby.GroupByMethods.time_method('int', 'any')
-        769±20μs         684±30μs     0.89  groupby.GroupByMethods.time_method('int', 'cumprod')
-       893±100μs         598±30μs     0.67  groupby.GroupByMethods.time_method('int', 'sem')
-         145±2ms          275±7μs     0.00  groupby.GroupByMethods.time_method('int', 'bfill')
-         154±3ms          245±1μs     0.00  groupby.GroupByMethods.time_method('int', 'ffill')
-         228±5ms         270±10μs     0.00  groupby.GroupByMethods.time_method('float', 'bfill')
-         237±5ms        232±0.5μs     0.00  groupby.GroupByMethods.time_method('float', 'ffill')

chris-b1

looks pretty good

chris-b1 · 2018-02-14T21:56:53Z

pandas/_libs/groupby_helper.pxi.in

+
+    N = len(out)
+
+    sorted_labels = np.argsort(labels).view(dtype=np.int64)


copy as in #19701

chris-b1 · 2018-02-14T22:06:41Z

pandas/core/groupby.py

+        if limit is None:
+            limit = -1
+        output = {}
+        if type(self) is DataFrameGroupBy:


What's this if branch for?

Fill is unique in that unlike say transformations the column(s) used in the grouping are also returned as part of the output of a DataFrame object. _iterate_slices does not go over those items and I didn't see an existing method to handle that, so this conditional makes sure the grouped columns still make their way into the frame that gets returned

FWIW I was thinking to myself if its worth adding an instance method to the GroupBy object to handle this (_iterate_groups or something to the effect) but couldn't think of any other function that requires this feature

rather than doing the if type(...), make a method in SeriesGroupBy and DataFrameGroupBy

codecov · 2018-02-15T01:20:16Z

Codecov Report

❗ No coverage uploaded for pull request base (master@feedf66). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #19673   +/-   ##
=========================================
  Coverage          ?   91.65%           
=========================================
  Files             ?      150           
  Lines             ?    48962           
  Branches          ?        0           
=========================================
  Hits              ?    44875           
  Misses            ?     4087           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.03% <100%> (?)`
#single	`41.8% <17.24%> (?)`

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.31% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update feedf66...eff6603. Read the comment docs.

jreback · 2018-02-16T18:55:20Z

asv_bench/benchmarks/groupby.py

+              ['all', 'any', 'bfill', 'count', 'cumcount', 'cummax', 'cummin',
+               'cumprod', 'cumsum', 'describe', 'ffill', 'first', 'head',
+               'last', 'mad', 'max', 'min', 'median', 'mean', 'nunique',
+               'pct_change', 'prod', 'rank', 'sem', 'shift', 'size', 'skew',


do we bench for the compat methods for datetimelikes? (timestamp / timedelta), most of these would work (except prod, mean, mad, pct_change and a few more), though these likely work for timedeltas. ok with adding an issue to do this as well (IOW doesn't need to be in this PR).

jreback · 2018-02-16T18:55:49Z

doc/source/whatsnew/v0.23.0.txt

@@ -646,6 +646,7 @@ Performance Improvements
 - Improved performance of pairwise ``.rolling()`` and ``.expanding()`` with ``.cov()`` and ``.corr()`` operations (:issue:`17917`)
 - Improved performance of :func:`DataFrameGroupBy.rank` (:issue:`15779`)
 - Improved performance of variable ``.rolling()`` on ``.min()`` and ``.max()`` (:issue:`19521`)
+- Improved performance of :func:`GroupBy.ffill` and :func:`GroupBy.bfill` (:issue:`11296`)


I am not sure this will render. you can just do .groupby().ffill() and so on

jreback · 2018-02-16T18:58:49Z

pandas/_libs/groupby_helper.pxi.in

+    with nogil:
+        for i in range(N):
+            idx = sorted_labels[i]
+            if mask[idx] == 1:  # is missing


curr_fill_idx seems potentially not defined (if it is missing but the limit is not hit), not sure if that is possible. maybe could initialize curr_fill_idx = -1?

The initialization occurs at declaration. Can move that into the body of the function if you feel that is more readable

ahh see that ok then

jreback · 2018-02-16T19:00:18Z

pandas/_libs/groupby_helper.pxi.in

+        int64_t curr_fill_idx=-1
+        int64_t idx, filled_vals=0
+
+    N = len(out)


maybe want an assert that N == len(labels) == len(mask)

jreback · 2018-02-16T19:01:04Z

pandas/core/groupby.py

+        if limit is None:
+            limit = -1
+        output = {}
+        if type(self) is DataFrameGroupBy:


rather than doing the if type(...), make a method in SeriesGroupBy and DataFrameGroupBy

jreback · 2018-02-16T19:02:48Z

pandas/core/groupby.py

+        for name, obj in self._iterate_slices():
+            indexer = np.zeros_like(labels)
+            mask = isnull(obj.values).view(np.uint8)
+            libgroupby.group_fillna_indexer(indexer, mask, labels, how,


maybe make a routine that shares code with how the shift_indexer is called (I am talking about in python space here)

jreback · 2018-02-18T16:30:26Z

pandas/_libs/groupby_helper.pxi.in

+    with nogil:
+        for i in range(N):
+            idx = sorted_labels[i]
+            if mask[idx] == 1:  # is missing


ahh see that ok then

jreback · 2018-02-18T16:30:35Z

pandas/_libs/groupby_helper.pxi.in

+                curr_fill_idx = idx
+
+            out[idx] = curr_fill_idx
+            # If we move to the next group, reset


blank line here

WillAyd · 2018-02-19T19:09:32Z

Latest commit includes a shared function between fill and shift, which can theoretically house any / all implementations with a little clean up as well. As touched on in previous conversations, if we expand this it should probably be in its own module

Open to review and plan to change the location of the func within the module / update docs, but I wanted to push because this commit will cause tests to fail at the following location:

pandas/pandas/tests/groupby/test_transform.py

Line 555 in 718d067

tm.assert_frame_equal(expected, getattr(gb, op)(*args))

The reason this is failing is because I replaced output = {} in the original shift code with output = collections.OrderedDict() in the newly shared function. I'm not entirely clear on why that would impact the test case, but it seems like a more explicit code path regardless.

I believe the test case is wrong anyway and the failing line should either have a .sort_index call or be placed before the expected = expected.sort_index(axis=1) call a few lines ahead of it, but wanted to get your input and make sure I am not misreading that

jreback · 2018-02-19T19:22:39Z

The reason this is failing is because I replaced output = {} in the original shift code with output = collections.OrderedDict() in the newly shared function. I'm not entirely clear on why that would impact the test case, but it seems like a more explicit code path regardless.

its prob happenstance that this worked before, IOW it happened to be sorted already. go ahead and add what is needed to make it pass

(and FYI make this an expected = ), rather than code in assert_frame_equal block

jreback

any reason you want this to be a separate module at this point? it looks ok here

jreback · 2018-02-19T19:24:02Z

pandas/core/groupby.py

+        -------
+        GroupBy object populated with appropriate result(s)
+        """
+        exp_kwds = collections.OrderedDict([


these should be passed in directly (from the calling function), no? rather than hard coded inside this function.

Right, maybe the caller passes in a partial the kwargs baked rather than the string version of the funtion?

_get_cythonized_result( partial(pd._libs.groupby.group_shift_indexer, nperiods=2), ... )

I’ll think more about that. The one complicating factor is that masked values cannot be passed in directly as they are calculated on a per slice basis. To avoid a convoluted set of calls I figured it would make sense to have all keywords and positional arguments resolved within one function, but there’s certainly other ways to go about this

@chris-b1 comment is about the exp_kwds, I am still not clear why the caller cannot pass these

jreback · 2018-02-19T19:24:53Z

pandas/core/groupby.py

+            if needs_ngroups:
+                func = partial(func, ngroups)
+
+            # Convert any keywords into positional arguments


you can pass kwargs to cython functions, so not sure this is necessary

I explicitly converted kwargs into positional arguments from the first comment back in #19481. If we are OK with kwargs in the Cython layer I can rewrite this, as it could simplify a few things

yes, you can pass kwargs

jreback · 2018-02-19T19:25:24Z

pandas/core/groupby.py

+        base_func = getattr(libgroupby, how)
+
+        for name, obj in self._iterate_slices():
+            indexer = np.zeros_like(labels)


do we need to specify int64? (or platform int)?

This was a copy/paste of existing code, so I didn't give it too much thought. That said, labels is already wrapped with a _ensure_int64 call as part of group_info within the BaseGrouper, so unless there's something in particular you know of I figure it would be easiest to inherit that type from labels and not override

jreback · 2018-02-19T19:26:11Z

pandas/core/groupby.py

+    def _fill(self, direction, limit=None):
+        """Overriden method to concat grouped columns in output"""
+        res = super()._fill(direction, limit=limit)
+        output = collections.OrderedDict()


would be slightly simpler as a list-comprehension

though maybe this should be an arg to _fill? (IOW the option to do it should be done in fill)

Do you mean as an arg to _get_cythonized_result (fill would always use, unless we wanted to implement a new feature)? I was thinking about that but given fill on a DataFrameGroupBy is the only operation I know of at the moment that includes the grouping in the body of the returned object I figured it made the most sense to just perform that action there

chris-b1 · 2018-02-19T19:35:36Z

pandas/core/groupby.py

+        base_func = getattr(libgroupby, how)
+
+        for name, obj in self._iterate_slices():
+            indexer = np.zeros_like(labels)


chris-b1 · 2018-02-19T19:38:50Z

pandas/core/groupby.py

+        -------
+        GroupBy object populated with appropriate result(s)
+        """
+        exp_kwds = collections.OrderedDict([


Right, maybe the caller passes in a partial the kwargs baked rather than the string version of the funtion?

_get_cythonized_result( partial(pd._libs.groupby.group_shift_indexer, nperiods=2), ... )

chris-b1 · 2018-02-19T19:41:44Z

pandas/core/groupby.py

+            indexer = np.zeros_like(labels)
+            func = partial(base_func, indexer, labels)
+            if needs_mask:
+                mask = isnull(obj.values).astype(np.uint8, copy=False)


Contrary to the int64 case, you do want to use view here - numpy doesn't see bool and uint8 as the 'same' type, so will always copy, which view elides.

Thanks for the heads up - still trying to wrap my head around some of the nuances there. Out of curiosity, how do you know this? Just from experience or is there a good reference?

I do see that astype documentation mentions dtype, order and subok requirements need to be satisfied to prevent copy, but I'm not clear on what those requirements are and if they are documented

Unfortunately all I can really point to is trial and error. Some "general" rules, at least in a pandas context

copy=False will only elide a copy if the dtype is identical (may be contradictions to this, but at least not in general). Useful for conditional casts.

arr.view(dtype) never copies. Viewing across types should only be used if if the itemsize of the two types are identical. Primarily useful for casting a type with some metadata down to a a more primitive type (bool to uint8, datetime64 to int64).

WillAyd · 2018-02-19T19:48:23Z

To clarify my point about having a separate module for _get_cythonized_result I don't think that method as a stand-alone needs it's own module, but thinking long term and going back to the discussion in #19481 a class to more effectively dispatch all of the functions (agg and transform included) may make sense as a sub-module of groupby.py and _get_cythonized_result could potentially evolve into that

jreback · 2018-02-19T19:51:44Z

@WillAyd ahh perfect - yes absolutely in favor of s class dispatch mechanism
and should be in its own cython/python modules as needed
ok with merging this (small fix ups) then attacking that (any/all can be in current or new framework)

pep8speaks · 2018-02-19T22:27:18Z

Hello @WillAyd! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 24, 2018 at 16:41 Hours UTC

jreback · 2018-02-20T13:42:17Z

pandas/core/groupby.py

@@ -1457,6 +1457,15 @@ def expanding(self, *args, **kwargs):
        from pandas.core.window import ExpandingGroupby
        return ExpandingGroupby(self, *args, **kwargs)

+    def _fill(self, direction, limit=None):
+        # Need int value for Cython


can you add a doc-string

jreback · 2018-02-20T13:44:19Z

pandas/core/groupby.py

+        -------
+        GroupBy object populated with appropriate result(s)
+        """
+        exp_kwds = collections.OrderedDict([


@chris-b1 comment is about the exp_kwds, I am still not clear why the caller cannot pass these

jreback · 2018-02-20T13:44:29Z

pandas/core/groupby.py

+            if needs_ngroups:
+                func = partial(func, ngroups)
+
+            # Convert any keywords into positional arguments


yes, you can pass kwargs

jreback · 2018-02-21T00:22:13Z

pandas/_libs/groupby_helper.pxi.in

        int64_t lab, idxer, idxer_slot
        int64_t[:] label_seen = np.zeros(ngroups, dtype=np.int64)
        int64_t[:, :] label_indexer

+    periods = kwargs['periods']


hmm, if you are not doing a .get() here, is there a reason you are not simply haveing periods=None in the signature? (if its optional)?

jreback · 2018-02-21T00:22:40Z

pandas/_libs/groupby_helper.pxi.in

+        ndarray[int64_t] sorted_labels
+        int64_t limit, idx, curr_fill_idx=-1, filled_vals=0
+
+    direction = kwargs['direction']


same again why not simply list them in the signature? (you can have **kwargs if you want to ignore other ones)

I didn't want to define a default value for something like periods in both the Python and Cython layers, leaving the former to set that default and be responsible for passing to Cython.

It's a pretty trivial change so will re-push with your changes reflected soon

Think I've been looking at this change too long...planning on making these required in the Cython signature and using the existing defaults in the Python layer to populate those via **kwargs. If we are not on the same page let me know

no that sounds good

WillAyd · 2018-02-24T23:37:15Z

Travis failure on latest commit looks unrelated (something with pyarrow?)

jreback · 2018-02-25T16:05:35Z

thanks @WillAyd very nice as always!

chris-b1 suggested changes Feb 13, 2018

View reviewed changes

WillAyd changed the title ~~Grp fill perf~~ Cythonized GroupBy Fill Feb 13, 2018

WillAyd force-pushed the grp-fill-perf branch from 5e007f8 to 656cf77 Compare February 13, 2018 20:57

WillAyd force-pushed the grp-fill-perf branch from 656cf77 to 4ba5994 Compare February 13, 2018 23:19

gfyoung added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance labels Feb 14, 2018

WillAyd force-pushed the grp-fill-perf branch from 5eebf6e to 60a17bb Compare February 14, 2018 06:34

chris-b1 suggested changes Feb 14, 2018

View reviewed changes

WillAyd force-pushed the grp-fill-perf branch from 60a17bb to e4bceba Compare February 15, 2018 00:33

jreback requested changes Feb 16, 2018

View reviewed changes

WillAyd mentioned this pull request Feb 16, 2018

Add ASV Benchmarks for GroupBy Ops on Datetimelike #19733

Closed

jreback requested changes Feb 18, 2018

View reviewed changes

WillAyd force-pushed the grp-fill-perf branch from e4bceba to aba1467 Compare February 19, 2018 19:01

jreback requested changes Feb 19, 2018

View reviewed changes

chris-b1 reviewed Feb 19, 2018

View reviewed changes

WillAyd force-pushed the grp-fill-perf branch from efac226 to 8af12e3 Compare February 20, 2018 01:32

jreback requested changes Feb 20, 2018

View reviewed changes

WillAyd force-pushed the grp-fill-perf branch from 8bf59dc to 2d78efc Compare February 20, 2018 19:02

jreback requested changes Feb 21, 2018

View reviewed changes

WillAyd force-pushed the grp-fill-perf branch from 2d78efc to 42dceac Compare February 21, 2018 01:37

WillAyd added 20 commits February 24, 2018 08:33

Added tests to mix group entries; fixed sort bug

a52b8c4

Simplied groupby Cython calls for ffill/bfill

16c1823

Removed abandoned Cython implementation

bd3d5e0

Added upcast to int64 to prevent 32 bit failures

cae65af

Fixed issue with reconstructing grouped Series

0266514

Changed .view to .astype to avoid 32 bit segfaults

50dc690

Added whatsnew

9fa8e25

Aligned group_fillna and group_shift signatures

5da06d8

Fixed failing test; list comp for _fill method

2fe91a4

Updated whatsnew

825ba17

PEP8 fixes

127c71c

Py27 support with super call

3a23cd6

Fixed LINT issue

a363146

Used kwargs to call Cython groupby funcs

fd513c8

Docstring for _fill method

776d1b7

Cleaned up kwargs passing to Cython layer

33f0d06

Idiomatic update - replace join with concat

662008a

Moved non-templated funcs to groupby.pyx

27e24fa

Code update - swap group_index.take with grouper

6f72476

Rebase and update import

eff6603

WillAyd force-pushed the grp-fill-perf branch from 422c0f3 to eff6603 Compare February 24, 2018 16:40

jreback merged commit d87ca1c into pandas-dev:master Feb 25, 2018

WillAyd deleted the grp-fill-perf branch February 25, 2018 20:38

WillAyd mentioned this pull request Feb 26, 2018

Cythonized GroupBy pct_change #19919

Merged

4 tasks

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

Cythonized GroupBy Fill (pandas-dev#19673)

08732e0

mroeschke mentioned this pull request Feb 28, 2018

PERF: Discrepancy in groupby methods #19165

Closed

7 tasks

adbull mentioned this pull request May 25, 2018

BUG: incorrect groupby().ffill() in pandas 0.23.0 #21207

Closed

adbull mentioned this pull request Jun 18, 2018

BUG: groupby().ffill() adds group labels as extra column #21521

Closed


		N = len(out)

		sorted_labels = np.argsort(labels).view(dtype=np.int64)

Cythonized GroupBy Fill #19673

Cythonized GroupBy Fill #19673

Conversation

WillAyd commented Feb 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris-b1 commented Feb 13, 2018

WillAyd commented Feb 13, 2018

WillAyd commented Feb 13, 2018 • edited Loading

chris-b1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 15, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Feb 19, 2018

jreback commented Feb 19, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd Feb 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Feb 19, 2018

jreback commented Feb 19, 2018

pep8speaks commented Feb 19, 2018 • edited Loading

Comment last updated on February 24, 2018 at 16:41 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Feb 24, 2018

jreback commented Feb 25, 2018

WillAyd commented Feb 13, 2018 •

edited

Loading

WillAyd commented Feb 13, 2018 •

edited

Loading

codecov bot commented Feb 15, 2018 •

edited

Loading

WillAyd Feb 19, 2018 •

edited

Loading

pep8speaks commented Feb 19, 2018 •

edited

Loading